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I  Introduction 

There  seems  to  be  a  concensus  that  two  main  measures  in  classical  mental  test  theory  are  the 
reliability  and  validity  coefficients  of  a  teat.  Although  these  measures  have  widely  been  accepted  by 
psychologists  and  test  users  in  the  past  decades,  they  are  actually  the  attributes  of  a  specified  group  of 
examinees  as  well  as  of  a  given  test,  since  the  correlation  coefficient  is  used  in  either  case.  In  addition, 
representation  of  these  mestsures  by  single  numbers  results  in  over-simplification  and  the  lack  of  useful 
information  for  both  theorists  and  actual  users  of  tests.  The  same  applies  for  the  standard  error  of 
measurement  also. 

In  latent  trait  models,  the  item  and  test  information  functions  provide  us  with  abundant  information 
about  the  local  accuracy  of  estimation,  a  concept  which  is  totally  missing  in  classical  mental  test  theory. 
These  functions  do  not  depend  upon  any  specific  group  of  examinees  as  the  reliability  coefficient  does, 
or  we  can  say  that  they  are  population-free.  By  virtue  of  this  characteristic,  adding  further  information 
about  the  MLE  bias  function  of  the  test  and  the  ability  distribution  of  the  examinee  group,  we  can 
provide  the  tailored  reliability  coefficient  and  standard  error  of  measurement  in  the  classical  mental 
test  theory’s  sense  for  each  and  every  specified  group  of  examinees  who  have  taken  the  same  test!  (cf. 
Samejima,  1977b,  1987). 

This  progressive  desolution  of  the  reliability  coefficient  and  of  the  standard  error  of  measurement 
in  classical  mental  test  theory  and  their  replacement  by  the  test  information  function  in  latent  trait 
models  is  further  facilitated  by  the  recent  proposal  of  the  modifications  of  the  test  information  function, 
using  the  MLE  bias  function  (cf.  Samejima,  1987,  1990).  In  the  present  paper,  it  will  be  shown  how  we 
can  predict  the  so-called  reliability  coefficient  and  standard  error  of  measurement  of  a  test  in  the  sense 
of  classical  mental  test  theory,  taking  advantage  of  the  new  developments  in  latent  trait  models. 


II  Test  Information  Function  and  Its  Modifications 

Let  8  be  ability,  or  latent  trait,  which  takes  on  any  real  number.  We  assume  that  there  is  a  set 
of  n  test  items  measuring  6  whose  characteristics  are  known.  Let  g  denote  such  an  item,  kg  be 
a  discrete  item  response  to  item  g  ,  and  Pfc,(^)  denote  the  operating  characteristic  of  kg  ,  or  the 
conditional  probability  assigned  to  kg  ,  given  8  ,  i.e.. 


(2.1) 


Pk,(e)  =  Prob.\kg  1  8]  . 


We  assume  that  Pkg{8)  is  three-times  differentiable  with  respect  to  8  .  We  have  for  the  item  response 
information  function  (Samejima,  1972) 

(2.2)  lk,{0)  =  -^iogPk,{8)  , 

and  the  item  in/orma<»on /unction,  /g($)  ,  is  defined  as  the  conditional  expectation  of  /*,(^)  ,  given 
8  ,  such  that 

(2.3)  Ig{e)  =  E[h,{e)\8]  =  '£ik.{e)Pk.(e)  . 

kg 

In  the  special  case  where  the  item  g  is  scored  dichotomously,  this  item  information  function  is  simplified 
to  become 
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(2.4) 


=  i§gP,m?\{PAm{i-PAe)}r  . 

where  Pg[0)  is  the  operatmg  characteristic  of  the  correct  answer  to  item  g  .  Let  V  be  a  mpotue 
pattern  such  that 


(2.5)  y=  (knV  g  =  l,2,...,n  . 

The  operating  characteristic,  /V  (^) ,  of  the  response  patten  V  is  defined  as  the  conditional  probability 
of  V  I  given  6  ,  and  by  virtue  of  locaJ  independence  we  can  write 

(2.6)  iv(^)  =  n  pi^A^)  • 

kftV 

The  response  pattern  information  function  (Samejima,  1972),  Iv{6)  ,  is  given  by 

(2.7)  Iv{e)  =  -^logiV(^)  =  • 

k,iV 

and  the  test  information  function,  J{6)  ,  is  defined  as  the  conditional  expectation  of  Iv(^)  >  given  6  , 
and  we  obtain  from  (2.2),  (2.3),  (2.5),  (2.6)  and  (2.7) 

(2.8)  I{8)  =  E{iv{e)\e]  =X;/v(W(«)  . 

V  s=i 

A  big  advantage  of  modem  mental  test  theory  is  that  the  standard  error  of  estimation  can  locally  be 
defined  by  using  .  Unlike  its  counterpart  in  classical  mental  test  theory,  this  function  does 

not  depend  upon  the  population  of  examinees,  but  is  solely  a  property  of  the  test  itself,  which  should  be 
the  way  if  we  call  it  the  standard  error,  or  the  reliability,  of  a  test.  It  is  well  known  that  this  function 
provides  us  with  the  asymptotic  standard  deviation  of  the  conditional  distribution  of  the  maximum 
likelihood  estimate  of  8  ,  given  its  true  value. 

Lord  has  proposed  a  bias  function  for  the  maximum  likelihood  estimate  of  8  in  the  three-parameter 
logistic  model  whose  operating  characteristic  of  the  correct  answer,  Pp(^)  ,  is  given  by 

(2.9)  Pg[8)  =  Cp-(-(l-Cp)[l-l-exp{-/?Op(^-l>p)}l"^  , 

where  Og  ,  bg  ,  and  Cg  are  the  item  discrimination,  difficulty,  and  guessing  parameters,  and  D  ia  a 
scaling  factor,  which  is  set  equal  to  1.7  when  the  logistic  model  is  used  as  a  substitute  for  the  normal 
ogive  model.  Lord’s  bias  function,  which  is  denoted  by  B(8v  \  8)  in  this  paper,  can  be  written  u 

(2.10)  B(8v\8)  =  D\I{8)]-^j2^gIg{8)\Un-h  , 

9-1 


where 


(2.11) 


f/)g(e)  =  [1  +  exp{-Pop(^  -  6p)}]  * 
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(cf.  Lord,  1983).  We  can  see  in  the  above  formula  of  Lord’s  MLE  bias  function  that  the  bias  should  be 
negative  when  rl>g{6)  is  less  than  0.5  for  all  the  items,  which  is  necessarily  the  case  for  some  interval 
of  0  ,  (— oo,  ^x,)  ,  and  should  be  positive  when  is  greater  than  0.5  for  all  the  items,  which  also 

necessarily  happens  for  some  interval,  {B/f ,  oo)  ,  and  in  between  the  bias  tends  to  be  close  to  sero,  for 
the  last  factor  in  this  formula  assumes  negative  values  for  some  items  and  positive  for  some  others,  and, 
therefore,  they  cancel  themselves  out,  provided  that  the  difficulty  parameter  bg  distributes  widely. 
Lord  hc«  applied  this  MLE  bias  function  to  an  85-item  SAT  Verbal  test  (Lord,  1984),  and  the  result 
shows  a  fairly  wide  range  of  0  in  which  the  bias  is  practically  nil. 

In  the  general  case  of  discrete  item  responses,  we  obtain  for  the  bias  function  of  the  maximum 
likelihood  estimate  (cf.  Samejima,  1987) 

(2.12)  B{Bv\B)  =  E[6v  -B\B]  =  -{l/2)l/{0)]-"  53  W 

g=l  k, 

=  -(i/2)[/(f))]-=*f:53i^*,(tf)/^*;(«)[Pfc.((?)j-‘  , 

s=l  k, 

where  Aifg(B)  is  the  basic  function  for  the  discrete  item  response  kg  ,  and  P^^(B)  and  I^'^(B)  denote 
the  first  and  second  partical  derivatives  of  Pit,(^)  with  respect  to  B  ,  respectively.  On  the  graded 
response  level  where  item  score  Xg  assumes  successive  integers,  0  through  rrig  ,  each  kg  in  the  above 
formula  must  be  replaced  by  the  graded  item  score  Xg  .  On  the  dichotomous  response  level,  it  can  be 
reduced  to  the  form 

(2.13)  B{Bv\6)  =  E{Bv-B\e]  =  (-l/2)(/(^)r=» /g(^)/^'(tf)lP;(^)]-^  , 

g=l 

with  Pg{B)  and  Pg{8)  indicating  the  first  and  second  partial  derivatives  of  Pg[B)  with  respect  to 
8  ,  respectively.  This  formula  includes  Lord’s  bias  function  in  the  three-parameter  logistic  model  as  a 
special  case. 

Using  this  MLE  bias  function  and  taking  the  reciprocal  of  an  approximate  rriinimum  variance  bound 
of  the  maximum  likelihood  estimator,  a  modified  test  information  function,  T(B)  ,  has  been  defined  by 

(2.14)  r{B]  =  I(e)[l+±B{By\B)]-^  , 

which  is  a  reciprocal  of  an  approximate  minimum  bound  of  the  maximum  likelihood  estimator  (cf. 
Samejima,  1990).  FVom  this  formula,  we  can  see  that  the  relationship  between  this  new  function  and 
the  original  test  information  function  depends  upon  the  first  derivative  of  the  MLE  bias  function.  To 
be  more  precise,  if  the  derivative  is  positive,  then  the  new  function  will  assume  a  lesser  value  than  the 
original  test  information  function.  If  it  is  negative,  then  this  relationship  will  be  reversed.  If  it  is  sero, 
i.e.,  if  the  MLE  is  conditionally  unbiased,  then  these  two  functions  will  assume  the  same  value. 

The  second  modified  test  information  function,  E(9)  ,  is  defined  by 

(2.15)  E(0)  =  I{B)  {[1  +  ^BiBv  I  B)]-^  +  1(B)  \B[8v  \  fl)]"}"^  , 

which  is  the  reciprocal  of  an  approximate  minimum  bound  of  the  mean  squared  error  of  the  maximum 
likelihood  estimator  (cf.  Samejima,  1990).  We  can  see  that  the  difference  betwt.en  tl.z  two  modified 
test  information  functions,  T(^)  and  S(B)  ,  is  the  second  and  last  term  in  the  braces  of  the  right  hand 
side  of  formula  (2.15).  Since  this  term  is  nonnegative,  we  have 
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(2.16) 


S(tf)  <  T(tf) 


throughout  the  whole  range  of  6  ,  regardless  of  the  slope  of  the  MLE  bias  function. 

When  the  MLE  bias  function  of  the  test  is  monotone  increasing,  as  is  the  case  with  many  existing 
tests,  it  is  obvious  from  (2.14)  that  T(0)  will  assume  no  greater  values  than  those  of  the  original  test 
information  function  I[6)  .  The  same  applies  to  E(tf)  ,  and  we  have  the  relationship, 

(2.17)  S(e)  <  T(fi)  <  m  , 

throughout  the  whole  range  of  6  . 


Ill  Reliability  Coefficient  of  a  Test  in  the  Sense  of  Classical 
Mental  Test  Theory 

Although  we  can  handle  the  concept  of  reliability  much  better  in  modern  mental  test  theory  by  using 
the  test  information  function,  1(6)  ,  or  one  of  its  modification  formulae,  T(^)  or  S($)  ,  it  has  also 
been  observed  (Samejima,  1977b)  that,  if  we  wish,  the  reliability  coefficient  of  a  test  in  the  sense  of 
classical  mental  test  theory  can  be  obtained  easily  firom  the  observed  data  and  the  test  information 
function  under  a  general  condition.  Since  we  have  two  modification  formulae  of  the  test  information 
function  now,  we  are  in  a  position  that  can  handle  the  prediction  of  the  reliability  coefficient  tailored 
for  a  specified  population  of  examinees  even  better. 

[111.1]  General  Case 

Let  9y  be  any  estimator  of  ability  6  .  We  can  write 

(3.1)  Oy  -  6  +  e  , 

where  e  denotes  the  error  variable.  In  the  test-retest  situation,  we  have 

6  +  Cl 
6  +  ^2  > 

where  the  subscripts,  1  and  2  ,  indicate  the  test  and  retest  situations,  respectively.  If  we  can  reasonably 
assume  that  in  the  test  and  retest  situations: 


(3.3) 

Cov.(ei,e2)  =  0  , 

(3.4) 

Vor.(ci)  =  Vor.(e2) 

and 

4 


(3.5) 


Cov.{6,ei)  =  Cov.{6,e2)  =  0  , 


then  we  will  have 

(3.6)  Corr.(0:rxXv2)  =  (Var.(tfC.  J  -  Var.(ei)]lVar.((?^i)]-'  . 


Note  that  if  we  replace  ability  6  by  one  of  its  transformed  forms,  true  test  score  T  ,  and  use  the 
observed  test  score  X  as  the  estimator  of  T  and  E  as  its  error  of  estimation,  then  (3.1)  can  be 
rewritten  in  the  form 

(3.7)  T  =  X  +  E  , 


which  represents  the  fundamental  assumption  in  classical  mental  test  theory,  and  (3.6)  becomes  a 
familiar  formula  for  the  reliability  coefficient  rxiXj  > 


(3.8) 


rx.x,  =  Var.(r)IVar.(A')]->  . 


In  classical  mental  test  theory,  however,  researchers  seldom  check  if  these  assumptions  are  acceptable. 
In  fact,  in  many  cases  (3.5)  is  violated  if  we  replace  6  hy  T  ,  and  ei  and  Cj  by  Ey  and  E2  , 
respectively,  unless  the  test  has  been  constructed  in  such  a  way  that  most  individuals  from  the  target 
population  have  mediocre  true  scores. 

We  can  write  in  general 


(3.9) 


Var.(e)  =  Efe  -  E(e)f 

=  E{e  -  E{e  \  S))*  +  E\B[c  \  6)  -  E[e)]^ 
+  2E[{e-E{e\0))(E{e\e)-E{e))]. 


This  indicates  that,  if  the  error  variable  e  is  conditionally  unbiased  for  the  interval  of  6  of  interest, 
then  (3.9)  will  be  reduced  to  the  form 


(3.10)  Var.(e)  =  E\e'^]  . 

[III.2]  Maximum  Likelihood  Estimator 

Let  §v  or  $  denote  the  maximum  likelihood  estimator  of  8  based  upon  the  response  pattern 
V  ,  U  1)  8  ia  conditionally  unbiased  for  the  interval  of  8  of  interest  and  2)  the  test  information 
function  1(8)  assumes  reasonably  high  values  for  that  interval,  then  we  will  be  able  to  approximate  the 
conditional  distribution  of  9  ,  given  0  ,  by  the  normal  distribution  1V(^,  [/(9)]~^^^)  for  the  interval 
of  8  within  which  the  examinees’  ability  practically  distributes.  Thus  we  have  from  (3.10) 
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(3.11) 


Var.{e)  =  . 


When  this  is  the  case,  &om  (3.6)  we  can  write 

(3.12)  CoTr.(hj2)  =  (Var.(^0-f:l{/((?)}-»)llVar.(S,))-»  . 

Thus  the  reliability  coefficient  in  the  sense  of  classical  mental  test  theory  can  be  predicted  by  a  single 
administration  of  the  test,  given  the  test  information  function  J[6)  and  the  ability  distribution  of  the 
examinees. 

It  has  also  been  observed  that  in  computerised  adaptive  testing  we  can  predict  the  reliability  coef¬ 
ficient  if  a  specified  amount  of  test  information  is  used  for  the  stopping  rule  for  a  given  level  of  ability 
in  each  of  the  test  and  retest  situations,  provided  that  the  above  two  conditions  1)  and  2)  are  met.  In 
such  a  case,  we  can  write 

(3.13)  Corr.ihM  =  [Var.(fii)  -  £:l{/,x)(<?)}-MllVar.(5i){Var.(^\)  -  £l{/,i)(9)}-^l 

where  /(i)(^)  and  /(3)(^)  are  the  preset  criterion  test  information  functions  in  the  test  and  retest 
situations,  respectively,  which  are  adopted  as  the  stopping  rules  for  the  two  separate  situations.  Note 
that  these  two  criterion  test  information  functions  need  not  be  the  same,  and  also  that  the  reliability 
coefficient  is  obtainable  from  a  single  administration.  In  a  simplified  case  where,  in  each  situation,  the 
same  amount  of  test  information  is  used  as  the  criterion  for  terminating  the  presentation  of  new  items 
for  every  examinee,  we  can  rewrite  the  above  formula  into  the  form 

(3.14)  Corr.(5i,^2)  =  [V’ar.(^i)  -  <r’][Vor.(^i){Var.(^i)  -  <7?  +  , 

where  and  are  the  reciprocals  of  the  constant  amounts  of  criterion  test  information  in  the 
two  separate  situations,  respectively.  If  we  use  the  same  constant  amount  of  test  information  as  the 
stopping  rule  in  both  the  test  and  retest  situations,  then  the  reliability  coefficient  takes  the  simplest 
form 

(3.15)  C0Tr.(e„e2)  =  [Var.(«i)-(r’]lVor.(ej)l-i  , 

where  denotes  the  reciprocal  of  this  common  constant  amount  of  test  information. 

The  appropriateness  of  the  above  normal  approximation  of  the  conditional  distribution  of  6  ,  given 
$  ,  can  be  examined  by  the  Monte  Carlo  method  (cf.  Samejima,  1977a).  We  also  notice  that  a  necessary 
condition  for  this  approximation  is  that  6  is  conditionally  unbiased  for  the  interval  of  0  of  interest. 
Thus  we  can  use  the  MLE  bias  function,  which  was  introduced  in  Section  2,  for  a  test  for  the  support 
of  the  approximation.  Note  that  the  MLE  bias  function  together  with  the  ability  distribution  of  the 
target  population  also  determines  whether  the  assumption  described  by  (3.5)  should  be  accepted. 
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If  the  conditional  unbiasedness  is  not  supported,  Le.,  if  B(Sv  |  0)  does  not  approximately  equal 
lero  for  all  values  of  0  in  the  interval  of  interest,  however,  then  we  shall  be  able  to  adopt  one  ci  the 
modified  test  information  functions,  T(0)  or  S(0)  .  Thus  we  can  rewrite  (3.12)  into  the  forms 


(3.16)  Corr.(0„03)  =  (Vor.(tfi)  -  £;l{T(S)}-‘]][Vor.(tfx)]-i 


(3.17)  Corr.(0,,02)  =  IVar.(tfi)  -  f;|{5(<?)}-'))lVar.(ei)]-i  . 


We  can  decide  which  of  the  modified  formulae,  (3.16)  or  (3.17),  is  more  appropriate  to  use  in  a  specified 
situation.  Also  in  computerised  aulaptive  testing,  either  T(0)  or  E($)  can  be  used  as  the  stopping 
rule  in  place  of  the  test  information  function  1(0)  ,  and  we  can  revise  (3.13)  into  the  forms 


(3.18)  Corr.(0i,02) 


lVar.(0\)  -  El{r^,)l0)}-^]]lVar.(0^){Var.(0\)  -  i;[{T,i)((?)}-M 


and 

(3.19)  Corr.(f„i,)  =  (V«f.(«,)  -  £l{=,„(»))->))|V.r.(«,){l'ar.(f.)  -  i:|{S|.,(«))-'| 


where  the  subscripts  (1)  and  (2)  represent  the  test  and  retest  situations,  respectively. 


IV  standard  Error  of  Measurement  of  a  Test  in  the  Sense  of 
Classical  Mental  Test  Theory 

In  classical  mental  test  theory,  the  standard  error  of  estimation  of  ability  is  represented  by  a  single 
number,  which  is  heavily  affected  by  ihe  degree  of  heterogeneity  of  the  group  of  examinees  tested,  as 
is  the  case  with  the  reliability  coefficient.  In  contrast,  in  latent  trait  models,  the  standard  error  of 
estimation  is  locally  defined,  i.c .,  as  a  function  of  ability,  which  is  the  reciprocal  of  the  square  root  of 
test  information  function.  Since  the  test  information  function  does  not  depend  upon  any  specific  group 
of  examinees,  but  is  a  sole  property  of  the  test  itself,  this  locally  defined  standard  error  is  much  more 
appropriate  than  the  standard  error  of  estimation  in  classical  mental  test  theory.  Also  this  function 
indicates  that  no  test  is  efficient  in  ability  measurement  for  the  entire  range  of  ability,  and  each  test 
provides  us  with  large  amounts  of  information  only  locally,  which  makes  a  perfect  sense  to  our  knowledge. 

The  standard  error  of  measurement  of  a  test  tailored  for  a  specific  ability  distribution  is  given  by 
(4.1)  S.E.  =  E[{I(0))-^^^\ 

when  the  conditions  1)  and  2)  described  in  the  preceding  section  are  met,  and  by 
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(4.2) 


S.E.l  = 


or 

(4.3)  S.E.2  =  i;l{E(«)}-‘/2l 


otherwise. 


V  Examples 

For  the  purpose  of  illustration,  six  ability  distributions  are  hypothesized,  and  for  a  single  test 
predictions  are  made  for  their  tailored  reliability  coefficients  and  tailored  standard  errors  of  measurement 
in  the  sense  of  classical  mental  test  theory,  using  (3.12),  (3.16),  (3.17),  (4.1),  (4.2)  and  (4.3).  These  six 
hypothetical  ability  distributions  are  normal  distributions,  i.e.,  ./V(0.0, 1.0)  ,  1V(— 0.8, 1.0)  ,  1V(0.0, 0.5)  , 
N{—0.S,0.5)  ,  fSr(— 1.6,0.5)  and  N(—2A,0.S)  .  Figure  5-1  presents  the  density  functions  of  these  six 
distributions.  The  hypothetical  test  consists  of  thirty  equivalent  dichotomous  items,  which  follow  the 
logistic  model  represented  by  (2.9)  with  Cg  =  0.0  ,  and  the  common  parameter  values  Og  =  1.0  and 
bg  =  0.0  ,  respectively,  with  the  scaling  factor  D  set  equal  to  1.7  .  Figure  5-2  presents  the  MLE  bias 
function  of  this  hypothetical  test.  We  can  see  in  this  figure  that  outside  the  interval  of  6  ,  (—1.0,  1.0)  , 
the  amount  of  bias  is  substantially  large.  The  square  roots  of  the  test  information  function  1(8)  and 
of  its  two  modification  formulae  T(5)  and  E(^)  of  this  test  are  shown  in  Figure  5-3. 

Tables  5-1  and  5-2  present  the  resulting  predicted  reliability  coefficients  and  standard  errors  of 
measurement  for  the  six  different  ability  distributions,  respectively.  In  each  table,  the  mean  and  the 
variance  of  8  of  each  of  the  six  distributions  are  also  given.  We  can  see  that  these  variances  are 
slightly  different  from  the  squares  of  the  second  parameters  of  the  normal  distributions,  i.e.,  0.98322 
vs.  1.00000  for  the  populations  1  and  2,  and  0.25155  vs.  0.25000  for  the  populations  3,  4,  5  and  6, 
respectively,  whereas  all  of  the  means  are  the  same  as  the  first  parameters  of  the  normal  distributions. 
These  discrepancies  in  variance  come  from  the  fact  that  we  used  frequencies  for  the  equally  spaced 
points  of  8  with  the  step  width  0.05  ,  which  are  given  as  integers,  in  order  to  approximate  the  normal 
distributions,  instead  of  using  the  density  functions  themselves. 

As  you  can  see  in  the  first  table,  the  predicted  reliability  coefficient  obtained  by  (3.12)  distributes 
widely,  i.e.,  it  varies  from  0.200  to  0.896  !  The  the  coefficient  reduces  as  ihe  main  part  of  the 
distribution  shifts  from  a  range  of  8  where  the  amount  of  test  information  is  greater  to  another 
range  where  it  is  lesser.  The  reduction  is  more  conspicuous  when  the  standard  deviation  of  the  normal 
distribution  is  smaller.  The  predicted  reliability  coefficient  obtained  by  (3.16)  using  T(9)  instead 
of  1(8)  indicates  a  substantial  reduction  from  the  one  obtained  by  (3.12)  for  each  of  the  six  ability 
distributions.  The  reduction  is  especially  conspicuous  for  the  populations  2,5,  and  6  ,  whose  ability 
distributes  on  lower  levels  of  8  where  the  discrepancies  between  J(8)  and  T(^)  are  large.  Among  the 
six  populations  the  predicted  reliability  coefficient  obtained  by  means  of  (3.16)  varies  from  0.012  to 
0.781  ,  showing  even  a  larger  range  than  that  obtained  by  (3.12).  Similar  results  were  obtained  for  the 
predicted  reliability  coefficient  given  by  (3.17),  using  E(^)  instead  of  1(8)  .  The  reliability  coefficient 
varies  from  0.011  to  0.766  ,  and  within  each  population  the  reduction  in  the  value  of  the  reliability 
coefficient  from  the  one  obtained  by  (3.16)  is  relatively  small,  as  is  expected  from  Figure  5-3. 

As  for  the  standard  error  of  measurement,  we  can  see  in  Table  5-2  that  similar  results  were  obtained, 
only  in  reversed  order,  of  course.  In  classical  mental  test  theory,  the  standard  error  of  measurement 
oe  is  given  by 
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FIGURE  6-1 

:tiona  of  Six  Hypotbetical  Ability  Distribationa;  n(0.0,  1.0), 
.0),  n(0.0,  0.5),  0.5),  ii(-1.6,  0.5)  and  ii(-2.4,  0.5). 
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FIGURE  5-3 

Square  Roots  of  the  Original  (Solid  Line)  and  the  Two  Modified  (Dashed  and  Dotted  Lines) 
Test  Information  Functions  of  the  Hypothetical  Test  of  Thirty  Equivalent  Items  FoDowing 
the  Logistic  Model  with  —  1.0  and  hf  =  0.0  As  the  Common  Parameters. 


TABLE  5-1 


Tlirec  Predicted  Reliability  Coefficients  TkOored  for  E»ch  of  the  Six  Hypotheticnl  Ability 
Distributions,  Using  the  OriginsI  Test  Information  Function  and  Its  Two  Modification 
Formulae.  The  Indices,  1,  2  and  3,  Represent  the  Original  Test  Infumation  Function, 
Modification  Formula  No.  1  and  Modification  Formula  No.  2,  Respectively.  The 
Mean  and  the  Variance  of  9  for  Each  Population  Are  Also  Given. 


POPULATION 

RELIABILITY 

RELIABILITY 

RELIABILITY 

MEAN  OF 

VARIANCE 

1 

2 

3 

THETA 

OF  THETA 

1 

0.89641 

0.78053 

0.76629 

0.00000 

0.98322 

2 

0.82324 

0.26479 

0.25256 

>0.80000 

0.98322 

3 

0.81738 

0.80074 

0.79920 

0.00000 

0.25155 

4 

0.73250 

0.66611 

0.65589 

-0.80000 

0.25155 

5 

0.47715 

0.21681 

0.20093 

-1.60000 

0.25155 

6 

0.20049 

0.01182 

0.01109 

-2.40000 

0.25155 

TABLE  5-2 


Three  Predicted  Standard  Ehron  of  Meaaorement  Tailored  for  Each  of  the  Six  Hypothetical 
Ability  Distribntiona,  Using  the  Original  Test  Information  Function  and  Its  Two 
Modification  Formulae.  The  Indices,  1,  2  and  S,  Represent  the  Original  Test 
Information  Function,  Modification  Formola  No.  1  and  Modification 
Formula  No.  2,  Respectively.  The  Mean  and  the  Variance  of  #  for 
Each  Population  Are  Also  Given. 


POPULATION 

STAND. ERROR 

1 

STAND. ERROR 

2 

STAND. ERROR 

3 

MEAN  OF 
THETA 

VARIANCE 
OF  THETA 

1 

0.30548 

0.37648 

0.38514 

0.00000 

0.98322 

2 

0.37887 

0.64293 

0.66397 

-0.80000 

0.98322 

3 

0.23521 

0.24717 

0.24811 

0.00000 

0.25155 

4 

0.29172 

0.32802 

0.33326 

-0.80000 

0.25155 

5 

0.48839 

0.73440 

0.76583 

-1.60000 

0.25155 

6 

0.91974 

2.76394 

2.88922 

-2.40000 

0.25155 

\ 


IS 


TABLE  5-3 


Three  Theoretical  Variancea  of  the  Maxbnitin  Likelihood  Eatimatea  ot  t  for  Each 
of  the  Six  Hypothetical  Ability  Diatribationa,  Uaing  the  Original  Teat  Information 
FWetion  and  Ita  Two  Modification  Formnlae.  The  Indicea,  1,  2  and  S,  Repreaent 
the  Original  Teat  Information  Fonction,  Modification  Formnla  No.  1  and 
Modification  Formula  No.  2,  Reapectively.  The  Mean  and  the  Variance 
of  t  for  Each  Population  Arc  Alao  Given. 


POPULATION 

VARIANCE 

OF  MLE  1 

VARIANCE 
OF  MLE  2 

1 

1.09684 

1.25968 

2 

1.19432 

J. 71324 

3 

0.30775 

0.31414 

4 

0.34341 

0.37763 

5 

0.52718 

1.16023 

6 

1.25469 

21.28788 

VARIANCE 

OF  MLE  3 

MEAN  OF 
THETA 

VARIA*jCE 
OF  TF.;  TA 

1.28308 

0.00000 

0.  :''i22 

3.89296 

-0.80000 

0.98322 

0.31475 

0.00000 

0.25155 

0.38352 

-0.80000 

0.25155 

1.25189 

-1.60000 

^  '5155 

22.68190 

-2.40000 

5155 
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TABLE  5-4 


Tliree  Tkeoretical  EliTor  Vkriftiiccf  for  Each  of  the  Six  Hypothetical  Ability  Distribntio&s, 
Using  the  Original  Test  Information  Function  and  Its  Two  Modification  Fonnnlae.  The 
Indices,  1,  2  and  3,  Represent  the  Original  Tut  Information  Function,  Modification 
Formnla  No.  1  and  Modification  Formula  No.  2,  Respectively.  The  Mean  and  the 
Variance  of  f  for  Each  Popnlation  Are  Also  Given. 


POPULATION 

VARIANCE 

OF  ERROR  1 

VARIANCE 

OF  ERROR  2 

VARIANCE 

OF  ERROR  3 

MEAN  OF 
THETA 

VARIANCE 
OF  THETA 

1 

0.11363 

0.27646 

0.29987 

0.00000 

0.98322 

2 

0.21111 

2.73003 

2.90974 

-0.80000 

0.98322 

3 

0.05620 

0.06260 

0.06320 

0.00000 

0.25155 

4 

0.09186 

0.12609 

0.13197 

-0.80000 

0.25155 

5 

0.27563 

0.90868 

1.00034 

-1.60000 

0.25155 

6 

1.00314 

21.03633 

22.43035 

-2.40000 

0.25155 
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TABLE  5-5 


Reliability  Coefficient  Computed  for  Each  of  the  Six  Hypothetical  Ability  Dktribntions  Baaed 
upon  the  Maximum  Likelihood  Eatimatea  of  the  Examiueea  for  Teat-Reteat  Situations  Uaing 
a  Teat  of  Thirty  Equivaleut  Items  Following  the  Logistic  Model  with  ,  Og  ^  1.0 

and  tg  —  0.0  .  The  Means  and  Variances  of  the  Two  Sessions  and  the  Covariance  Are 

Also  Presented. 


POPULATION 

RELIABILITY 

MEAN 

MEAN 

VARIANCE 

VARIANCE 

COVARIANCE 

1 

2 

1 

2 

1 

0.90788 

-0.00311 

0.00106 

1.19069 

1.16769 

1.07051 

2 

0.88812 

-0.81435 

-0.80971 

/ 

1.07982 

1.09703 

0.96663 

3 

0.80724 

0.00785 

-0.00754 

0.33578 

0.33443 

0.27051 

4 

0.72334 

-0.85777 

-0.84349 

0.40504 

0.39310 

0.28863 

5 

0.55304 

-1.68722 

-1.67511 

0.42299 

0.40820 

0.22980 

6 

0.32187 

-2.28115 

-2.25897 

0.21639 

0.23189 

0.07210 

1« 


(5.1) 


cr£  =  lVar.(A-))^/"Jl-rx,x,)‘/=‘  . 
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where,  u  before,  rx,X]  indicates  the  reliability  coefficient.  Comparison  of  Table  5>1  and  Table  5-2 
reveals  that  there  are  substantial  discrepancies  between  the  values  of  ob  obtained  by  formula  (5.1) 
using  the  tailored  reliability  coefficients  in  Table  5-1,  which  are  based  upon  the  maximum  likelihood 
estimate  §  ,  in  place  of  rx,Xi  in  (5.1)  and  the  corresponding  standard  errors  of  measurement,  which 
were  obtained  by  formulae  (4.1)  through  (4.3)  and  presented  in  Table  5-2.  To  give  some  examples,  for 
Population  No.  1  the  results  of  (5.1)  are:  0.319  ,  0.465  and  0.479  ,  respectively;  for  Population  No. 
3  they  are:  0.214  ,  0.224  and  0.225  ;  and  for  Population  No.  6  they  are:  0.448  ,  0.499  and  0.499  . 
These  results  are  understandable,  for  the  degree  of  violation  from  the  assumptions  behind  the  classical 
mental  test  theory  is  different  for  the  separate  ability  distributions. 

The  three  theoretical  variances  of  the  maximum  likelihood  estimate  of  6  and  the  three  theoretical 
error  variances  are  presented  in  Tables  5-3  and  5-4,  respectively,  for  each  of  the  six  hypothetical  pop¬ 
ulations.  The  latter  were  obtained  by  (3.11)  and  by  replacing  1(6)  in  (3.11)  by  T(^)  and  c($)  , 
respectively,  and  the  former  are  the  sum  of  these  separate  error  variances  and  the  variance  of  6  . 

In  order  to  satisfy  our  curiosity,  a  simulation  study  has  been  made  in  such  a  way  that,  following 
each  of  the  six  ability  distributions,  a  group  of  examinees  is  hypothesised,  and  using  the  Monte  Carlo 
method  a  response  pattern  of  each  hypothetical  subject  is  produced  for  each  of  the  test  and  retest 
situations.  Since  our  test  consists  of  thirty  equivalent  dichotomous  test  items,  the  simple  test  score  is  a 
sufficient  statistic  for  the  response  pattern,  and  the  maximum  likelihood  estimate  of  6  can  be  obtained 
upon  this  sufficient  statistic.  The  numbers  of  hypothetical  subjects  are  1,998  for  Populations  No.  1 
and  No.  2,  and  2,004  for  Populations  No.  3,  No.  4,  No.  5  and  No.  6.  The  correlation  coefficient 
between  the  two  sets  of  6 's  was  computed,  and  the  results  are  presented  in  Table  5-5.  Comparison  of 
each  of  these  results  with  the  corresponding  three  tailored  reliability  coefficients  in  Table  5-1  gives  the 
impression  that,  overall,  these  correlation  coefficients  are  higher  than  the  predicted  tailored  reliability 
coefficients.  This  enhancement  comes  from  the  fact  that,  in  each  distribution  there  are  certain  number 
of  subjects  who  obtained  negative  or  positive  infinity  as  6  ,  and  we  have  replaced  these  negative  and 
positive  infinities  by  more  or  less  arbitrary  values,  —2.65  and  2.65  ,  respectively,  in  computing  the 
correlation  coefficients.  Since  in  Population  No.  3  none  of  the  2, 004  hypothetical  subjects  got  negative 
or  positive  infinity  for  their  maximum  likelihood  estimates  of  ^  in  the  first  session,  and  only  three  got 
negative  infinity  and  none  got  positive  infinity  in  the  second  session,  this  result,  0.807  ,  will  be  the 
most  trustworthy  value.  We  can  see  that  this  value,  0.807  ,  is  less  than  0.817  obtained  by  using  the 
original  test  information  function  7(0)  ,  and  a  little  greater  than  0.801  obtained  upon  the  Modification 
Formula  No.  1,  T(^)  .  The  next  trustworthy  value  may  be  0.723  of  Population  No.  4,  for  which 
none  of  the  2, 004  subjects  obtained  positive  infinity  as  their  9 ’s  in  each  of  the  two  sessions,  and  56 
and  45  got  negative  infinity  in  the  first  and  second  sessions,  respectively.  This  value  of  correlation 
coefficient,  0.723  ,  is  a  little  less  than  the  predicted  reliability  coefficient  0.733  obtained  upon  1(6)  , 
but  somewhat  greater  than  0.666  ,  which  is  based  upon  T(9)  ,  the  Modification  Formula  No.  1 — the 
artificial  enhancement  is  already  visible.  The  numbers  of  subjects  who  obtained  negative  and  positive 
infinities  in  the  first  session  and  in  the  second  session  are:  56  ,  47  ,  43  and  49  for  Population  No.  1; 
197  ,  4  ,  195  and  6  for  Population  No.  2;  437  ,  0  ,  399  and  0  for  Population  No.  5;  and  1, 143  , 
0  ,  1, 118  and  0  for  Population  No.  6.  We  must  say  that,  for  these  four  distributions,  the  values 
of  correlation  coefficient  in  Table  5-5  should  not  be  taken  too  seriously,  for  these  values  are  enhanced 
because  of  the  involvement  of  too  many  substitute  values  for  negative  and  positive  infinities. 


VI  Discussion  and  Conclusions 

Test  information  function  1(6)  and  its  two  modification  formulae,  T(9)  and  E(^)  ,  are  used  to  pre¬ 
dict  the  reliability  coefficient  and  the  standard  error  of  measurement  which  are  tailored  for  each  specific 
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ability  distribution.  Examples  are  given  and  a  simulation  study  has  been  conducted  for  comparison. 

These  examples  have  been  rather  intentionally  chosen  to  make  the  differences  among  the  separate 
ability  distributions,  and  among  the  three  predicted  indices  for  each  ability  distribution,  clearly  visible, 
using  equivalent  test  items. 

Since  we  have  more  useful  and  informative  measures  like  the  test  information  function  and  its  two 
modified  formulae,  the  reliability  coefficient  of  a  test  is  no  longer  necessary  in  modem  mental  test 
theory.  And  yet  it  is  interesting  to  know  how  to  predict  the  coefficient  using  these  functions,  which  are 
tailored  for  each  separate  population  of  examinees.  In  this  process,  it  will  become  more  obvious  that 
the  traditional  concept  of  test  reliability  is  misleading,  for  without  changing  the  test  the  coefficient  can 
be  drastically  different  if  we  change  the  population  of  examinees. 
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